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Abstract. We approach the problem of the computation of persistent homology for large datasets by a 
divide-and-conquer strategy. Dividing the total space into separate but overlapping components, we are 
able to limit the total memory residency for any part of the computation, while not degrading the overall 
complexity much. Locally computed persistence information is then merged from the components and their 
intersections using a spectral sequence generalizing the Mayer- Vietoris long exact sequence. 

We describe the Mayer- Vietoris spectral sequence and give details on how to compute with it. This 
allows us to merge local homological data into the global persistent homology. Furthermore, we detail how 
the classical topology constructions inherent in the spectral sequence adapt to a persistence perspective, as 
well as describe the techniques from computational commutative algebra necessary for this extension. 

The resulting computational scheme suggests a parallclization scheme, and we discuss the communication 
steps involved in this scheme. Furthermore, the computational scheme can also serve as a guideline for which 
parts of the boundary matrix manipulation need to co-exist in primary memory at any given time allowing 
for stratified memory access in single-core computation. The spectral sequence viewpoint also provides easy 
proofs of a homology nerve lemma as well as a persistent homology nerve lemma. In addition, the algebraic 
tools we develop to approch persistent homology provide a purely algebraic formulation of kernel, image 
and cokernel persistence m. 
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1. Introduction 



Topological data analysis [3 15 looks to quantify the qualitative aspects of the geometry of a data set. 
The existence and location of voids in an otherwise densely present figure are among the topological features 
we can try detect. The formalism of homology allows us to do this. It has had a large impact: both in 
geometry and mathematics in general, and more recently in a computational topology and geometry setting. 
The key to the success of homology in computational geometry, as well as in other application fields - is 
the introduction in 13 of persistent homology, a multi-scale approach to the computation of homological 
features given a point cloud of samples. 



The original persistence algorithm 13 has a worst case, a 0{n ) running time . There has been con- 
siderable effort looking for more efhcient algorithms for computing persistence. For special cases, such as 
2-manifolds 13 , very efficient, nearly linear algorithms exist. In the general case, recent progress has shown 
that persistence can be computed in matrix multiplication time 22 or alternatively in an output sensitive 
way 1]. These running times are given in terms of siniplices, so another active line of research is into us- 
ing discrete Morse-theoretic methods or homotopical collapses 11 17 ,20 23 -25 to reduce the size of the 
underlying complex while ensuring the resulting homology computation gives the correct result. These are 
particularly useful when the underlying complex is described by a simpler object such as its graph [9 29 30 
(i.e. for flag complexes). 

The standard algorithm is most often used, since in practice, it has almost linear runtime. The algorithm 
depends on having random access to a representation of the boundary operator. The standard representation 
is as a sparse matrix and during the course of the standard algorithm in higher dimensions, the matrix 
quickly fills-in requiring O(n^) storage space. In practice, the requirement of random access to such a large 
data structure has proven to be the limiting factor in computation. There has been work on distributed 
computation, primarily focusing on ordinary homology in sensor networks and using either coreduction 
techniques as above 10 or via combinatorial Laplacians 26 . The latter approach does not readily extend 



to persistent homology and is much slower in practice than centralized methods. 

In this paper, we present an algorithm which uses a divide-and-conquer approach to computing both 
homology and persistent homology. Rather than computing on the entire complex, we break the computation 
up into smaller pieces. The benefit being that we can compute persistence independently on the each of the 
pieces or patches. The key contirbution of this paper is to show how to merge the results from the individual 
patches into a global quantitjQ Our approach is based on a tool from algebraic topology called spectral 
sequences. Dividing the space up based on a cover the spectral sequence gives the machinery necessary 
to iteratively compute better approximations of the persistent homology of a total space X from the local 
persistence computations in each Ui g U. 

We focus our attention on the underlying ideas based on spectral sequences and the resulting algorithms 
rather than on the analysis of these algorithms, which depend heavily on the cover (i.e. how we break 
up the underlying complex). We partially address the problem of computing covers by presenting some 
straightforward approaches which could be used. Using the developed techniques also gives a more convenient 
form of the Nerve 19 and Persistent Nerve [4j lemmas which are widely used in persistent homology, as well 
as an algebraic description of the image, kernel and cokernel persistence introduced in [Tj. 

The paper is structured as follows: we first introduce the Mayer- Vietoris spectral sequence, which is the 
main algebraic tool that will be used in the paper. We adapt the treatement of 16 to a persistent homology 
context, and provide mathematical descriptions of the spectral sequence computation. Using these structural 
results, in Section[3]we give explicit descriptions of all the needed algorithms; from basic algebraic primitives 
computing kernels, images and cokernels of maps between finitely presented lk[i]-modules to the concrete 
computation of the spectral sequence. Finally, in Section [5] we discuss several applications and corollaries to 
the theory and algorithmics in this paper. In particular, we discuss strategies for parallelizing persistence and 
for computing persistent homology with a stratified memory residence approach; the relationship between our 
algebraic primitives and the corresponding geometric primitives in [t] ; and we give proofs of the homological 
Nerve lemma and of the persistent homological Nerve lemma that avoid the requirement of contractibility 
of all the covering patches and patch intersections. 



The merging step is what has made this kind of approach difficult in the past, particularly for persistence 
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1.1. Background. For an introduction to persistent homology we refer the reader to 12 28 along with 19 
for an introduction to homology within algebraic topology. The main object which we look to compute is 
the persistence module. This is an algebraic object introduced in [3lj , which encodes the homology of a 
filtration, a sequence of increasing simplicial complexes 

= Xo C Xi C . . . C X„_i C Xat = X 

The module encodes the information in all pairwise maps between these spaces. It turns out that, this 
information can be represented by a barcode or persistence diagram which are a set of coordinates which 
represent the birth and death times of non-trivial homology classes in the filtration. 

These have geometrical meaning if the filtration is derived from some geometric quantity. Common 
examples include: the sub-level set filtration with respect to some function defined on the space and the 
filtration based on distances between points in an input point cloud. 

In this paper, we compute homology over coefficients in a field k. In this case, the homology over a 



space is a vector space. However, as shown in 31 , rather than computing the homology for each space 
independently, we can build graded lk[t]-linear chain complexes where the degree of each generator encodes 
the entry of a simplex into the filtration. Refering to this as persistence complex, the authors show that 
persistent homology is equivalent to ordinary homology on the persistence complex with coefficients in k[t]. 

The division of our space will be in terms of a cover. A cover of a space X is an indexed family of sets Ui 
such that X C UiUi. The methods we develop work with arbitrary covers of the space, making the method 
suitable for any division of the underlying space (of course, the performance will depend on the properties of 
the cover). Much of the exposition is given in the language of commutative algebra. This not only simplifies 
the exposition, but gives important insight into how the algorithms should work. Whereever possible, we 
try to put things into the context of previous approaches. 



2. The Mayer- Vietoris spectral sequence 



Spectral sequences have seen numerous uses in classical algebraic topology 21 . In the field of persistent 
homology, an awareness of the role of spectral sequences has been present from the start 31 , but the difficulty 
in computing with spectral sequences has consistently made alternative routes attractive. We do not give a 
complete background on spectral sequences here but rather, we focus our attention on one particular spectral 
sequence, which we use for the parallelization of persistent homology. For reference, spectral sequences are 
discussed extensively in 21 although more accessible introductions can be found in [6] and 18 



The spectral sequence we use can be viewed as generalization of the Mayer- Vietoris long exact sequence 
which given two pieces, relates the homology of their union with the homology of the pieces and their 
intersection. The same way that the long exact sequence encodes an inclusion-exclusion principle on homology 
classes, the corresponding spectral sequence provides the mechanics of an inclusion-exclusion principle for 
deeper intersections than the pairwise afforded by the long exact sequence. Frequently mentioned in the 
literature (inter alia: (2l]), we found the most detailed expositions in [l|[2{[T6]. The spectral sequence builds 
up a space very similar to the blowup complex introduced by Carlsson and Zomorodian in '32' , but uses more 
of the available information in order to knit together local information to a global homology computation. 
Also Carlsson and Zomorodian do not use the same filtration as we depend on here. The following is based 
very loosely on the exposition in jl6j . 

As in 321, we consider a filtered simplicial complex X, with a cover U by filtered subcomplexes X — IJ^gj Vi 
(see Figure |l[b)). The cover can of course be anything, but in practice we wish to divide the full filtered 
complex into almost disjoint pieces so as to maximally parallelize the computation. The condition of covering 
with filtered subcomplexes is trivially fulfilled if we construct the cover on the final space of the filtration. 



We recall the definition of the blowup complex from 32 




X A" 



where A*' is the | J|-simplex with vertices numbered from J. 
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(a) An underlying topolog- 
ical space covered by three 
sets. 



(b) The sets shown as dis- 
joint spaces. 



(c) Blowup complex. 



Figure 1. Construction of the blowup for a triple cover. In the blowup (c), anything in a double 
intersection is multiplied with an edge, and anything in a triple intersection is multiplied by a 2- 
simplex. While only visible in the top right one, all three parts of the curve going through a double 
intersection will be replaced by a triangulated quadrilateral. 
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(a) Double complex expression of the page. 



(b) Typical clement of the total complex. 



Figure 2. Double complex setup. A double complex is a grid as depicted in (a), with modules at 
each grid point, and two families of maps dP and between the modules. For a double complex, 
we require d^d^ + d^d" = as well as d'^d" — and d^d^ = 0. A typical element of the total 
complex is some tuple (qq, . . . , a„), and will map under the total differential to some other element 
{ujo, ■ ■ ■ ,(jj„-i) given by summing up the contributions from each direction. 



Fundamentally, the question is how we can compute the homology of the blowup complex. We approach 
this by separating out the problem into computational components. Whenever n, p and q occur simultane- 
ously, we shall assume that n = p + q. The g-chains in a p-fold intersection of the original cover generate 
p + g-chains in the chain complex of the blowup complex, and thus potentially p + q-homology. 

We use this to organize our computation. Let us write E^^ for the g-chains in a p-fold intersection of 
the original coveij^ The individual chain complexes E^ ^ have boundary maps inherent from their simplicial 
complex structure, and we shall write dp ^ for the map E^ ^ — >• El^_,, and d° for the aggregate map 
E" ^ E" 

For any particular J C D, and any particular j' S J, there is an inclusion map Hjgj ~^ C\j^j\j' ^j- If 
we multiply each such inclusion map with the sign of the corresponding term J ^ J\j' of the nerve complex 
boundary map, and sum the maps up for all j', we get a map ^ : E^ ^ — >■ El_, ^. Aggregating over all 
chains, we have a map : El^, 

A structure like this, with bi-indexed modules E^ ^ and horizontal maps d^ and vertical maps d^ such that 
d^ o d° = 0, d^ o d^ — and d^ o d^ + d^ o d^ — is called a double complex. For double complexes, we can 



'The g notation will be familiar to anyone who has worked with spectral sequences as the 0-page of the spectral sequence 
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generate a straightforward chain complex called the total complex by writing Tot(£'5J^)„ = ®p+q=n 
and introduce as a boundary map the aggregate map dP + d^. 

For our computation of the homology of the blowup complex, we rely on a fundamental theorem from 
algebraic topology [l6] : 

Theorem 1. H^{Tot{E'^„)) = i/,(X). 

While we shall not prove the theorem in its full extent here, we still need the statement of the isomorphism 
for the development of our algorithm. A chain in Ei ^ is a collection of g-chains in the covering patches. 
Such a chain can be mapped into C*X by simply summing up the component chains, included into the full 
complex. Just summing up these components may both introduce and remove homological features present 
in the composite chain - and the exact behaviour of these introductions and removals is encoded into the 
other summands of the direct sum. Thus, the isomorphism identifies a class [(ao, . . . , a„)] with a class [a] 
precisely if a is the result of summing up the components of ag. The chains ai, . . . , capture, in a way 
reminiscent of the inclusion-exclusion principle, the exact ways in which a differs from merely summing up 

The columns in the double complex provide a filtration on Tot(£'p We write E^^ ^ for the subcomplex 
consisting of the p leftmost columns, and write L„^p = im(i7„(Tot(£'<p ^)) Hn{Tot{E'^ ^))) for the image 
of the map induced by the inclusion of a particular filtration step into the entire double complex. Since the 
£;|p , filter E^„, the L„^p filter iJ„(Tot(£;^_ J). Furthermore, a single class [(ao, . . . is in L„^p if and 
only if the class has a representative on the form (ag, . . . , a^, 0, . . . , 0). 

While computing L„_p completely ends up being similar in difficulty to the original homology computation, 
we can consider the computation of just the elements that get added at any particular step of the filtration. 
In other words, if we compute L^^p/ L^^p-i, this tells us about only those elements of -ff*(X) that show up at 
precisely the p-fold intersections. Clearly, taking the direct sum over all p of these quotients, we can compute 
a complete description of the entire homology i?*(X). 

Given some element [(ao: ■ ■ ■ , ctp, 0, . . . , 0)] G Ln,p, the equivalence class it belongs to in Ln^p/ Ln^p-i is 
completely determined by the value of ap. The particular ap representing such an element is not uniquely 
determined, nor is every element oi E^ ^ a. valid choice for some element determining an equivalence class of 
Ln,p/ Ln,p-i- We shall write Z'^^ for the ap that do represent some equivalence class of L„,p/i„_p_i, and we 
shall write for the elements of E^^^ that represent the class of Ln,p/ Ln,p-i- 

The spectral sequence here aims to provide increasingly accurate approximations to these Z"^^ and B"^^^. 
These approximations are computed by introducing the conditions defining Z'^^ and B^^ one at a time. 
The case Z^ Fix some ap € Ep ^. For ap to be in Z^^, there has to be a sequence ap_i, . . . , ao as in 
Figure [s] (a) such that 

(1) (fap = d^Qfp-i = d^ap . . . d^a^ = d^a\ 

These equations express the condition d(ao, . . . , ap, 0, . . . , 0) = in Tot(£'"^). 
We define Z^ ^ to be the set of all ap that fulfill the conditions 

(2) d^ap — d^ap^i — d^ap . . . d^ap^,. = d^ap-r+i 

and in particular, we can compute Z^ ^ recursively as those elements ap of Zp~^ for which there in addition 
to the ap-i, . . . , ap_r+i elements guaranteed to exist since ap G ^p~q^^ there is also an ap_r G ^p-r q+r such 
that d^ap-r = d^ap-r+i- It is clear from these definitions that Z^ ^ — Z^^. 

The case Bp^- As above, we fix ap G iJp^. In order for ap to be in B^^ C Z^^, we additionally need a 
collection of elements /3p, . . . , /3„+i as in Figure [s] (b) such that 

(3) dO/3p + di/3p+i = ap + = ... d% + di/3„+i=0 

The simplest case when this happens is if there is some /3p such that d^/Jp = ap. In this case we can 
choose /3j = for all other We write Bp ^ for the vector space of all such ap. 

In general, we shall define Bp ^ to be the set of ap for which there are /3p, . . . , /3p+r satisfying the equations 

d% + dVp+i = ttp d"/3p+i + d^f3p+2 = ... d°(3p+r-i+d^/3p+r = d°/3p+, = 
In particular ap G Bp ^ if ap G Sp^^ or if there are /3p, . . . , /3p+r satsifying all these equations. 
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(a) Condition for a cycle: {ao, . . . , 02, 0, . . . , 0) is a 
cycle if there are ti;o,t»;i such that d^ao = uiq = d^ai, 
d^ai = oJi = d^a2 and dPa2 = 0. 



(b) Condition for a boundary: {ao, . . . , 0:2, 0, . . . , 0) is 
a boundary if in addition to the uJo,^i in (a), there 
are also /32, ■ • • , such that ^",32 + d^/93 = «2, d'^^s + 
= 0, d°/34 + rfift = and (trivially) d^/^s = 0. 



Figure 3. Cycles and boundaries in Ep ^. 

Zp ^ and Bp ^ as kernel and image of a differential. Suppose ap S ^. Then there exist ap-i, . . . , Qp-r 
as above. Let Up^r-i = d^ctp-r G Z'p_^_i ^_)_^. This element is well-defined, depending only on Up, up to a 
sequence of error terms for all the ap-i, each of which is guaranteed to be in the appropriate Bp_^ Hence, 
the map [ap] i— [wp-^-i] is a well-defined map : Zp^^/Bp ^ — ^p_r-i,g+r/^p-r-i,g+r- This collection of 
quotient modules is often denoted Ep^^ = Zp^^/Bp^^ and called the i?''-page. 

The elements ap, . . . , ap_r are an appropriate set of /Sj to witness u/p^r-i € ^p-r-i q+r- Re-indexing, we 
can conclude that ap G -5^+^ if and only if [ap] = d^[f5p+r+i] for some /3p+r+i S Zp^^^i ,j_^. 

Finally, if Wp-r-i G ^p_r-i,q-r' equivalently if (i''[ap] = [0], then it follows that ap £ Zp^^. 

All these statements can be easily proven by diagram chases. 

If at any stage of the computation, all the maps d^ : ^ El ^ are the zero-map, then the corresponding 
Zl ^ and Bl ^ already satisfy all their conditions, and so for that r, it already holds that Z^ = Z^^ and 
Bl ^, — B^^. Many proofs in algebraic topology work by proving that for a low value of r, the map d^ 
vanishes, eliminating the need to construct the higher index maps. In Section [5. 3| we shall see an application 
of this principle. 

3. Algorithms 

We shall state algorithms here both for the underlying algebraic operations, as well as for the resulting 
spectral sequence and persistence computations. 

3.1. Algebraic operations. In algebra textbooks written with a sufficiently general approach to linear 
algebra, the development of matrix arithmetic and i ts us es in solving linear equations will be done in a 
manner appropriate for a generic Euclidean domain 27\^ 
field, further Euclidean domains include Z, as well as k[, 
facts hold: any submodule of a free module is free, and Gaussian elimination is an appropriate algorithm 
to compute matrix ranks, function kernels and images, reducing bases to remove redundancies and to form 
appropriate reduction systems. 

In light of this, we draw the reader's attention to the implications for persistence. A persistence chain 
complex is a free graded lk[t]-module with a generator in degree k for each simplex that enters the filtered 
chain complex in the fcth filtration step. Persistent homology is a finitely presented graded (k[t]-module, with 
generators given by a cycle basis and relations given by a boundary sub-basis of the cycle basis. 

We can assume - especially in the topological case - that if some t'' J2 •^i*'''^^ is a boundary basis element, 
then the cycle was a cycle already k time-steps earlier, represented by ^ XjPnij. We shall represent finitely 
presented modules precisely as a segregated pair of bases: a generators basis and a separate relations basis. 
Because of this nice behaviour of the generators corresponding to specific relations, we can further allow 



It is a well-known fact that in addition to any 
t] for any field k. In this setting, a few useful 



'This is a slightly more restrictive class than principle ideal domains. 

6 



ourselves to keep the relations basis in a row-reduced form - to ensure that the division algorithm produces 
nice normal forms of all elements - and the generators basis reduced with respect to the relations. 

As the reader recalls, Gaussian elimination puts a matrix into a normal form by clearing out row by row 
by using the leftmost column in the matrix with no non-zero entries above of the current row to cancel out 
any entries occurring later. When working with graded k[t]-modules, some care has to be taken to avoid 
reducing any columns using columns that do not exist in an early enough grading. This can be accomplished 
by sorting the columns in the matrix in order of rising degree, and only reducing to the right. Tracking the 
operations performed during a reduction allows for the computation of a kernel by reading off the operations 
that produced all-zero columns in the reduced matrix. 

Given a function / : M/Q — > N/P, where M/Q and N/P are both represented by lists of generators and 
lists of relations as discussed above, the following three constructions come naturally: 

Image modules. To compute im/, we compute the list f{m) for m G M. Since f{Q) C P, the relations 
on im / are the relations listed in P, and so im / = f{M)/P. 

Cokernel modules. To compute coker/, we need to compute {N/P) /{f {M/Q)). Again, f{Q) C P, and 
thus we get the resulting module by imposing all the images f{m) of the generators of M as additional 
relations. Hence, coker / = N/{f{M) U Q). 

Kernel modules. The kernel is, just as in |7|, more complex than the image and cokernel. The computation 
of the kernel module as well as its embedding into M/Q is computed in a two-step process. An element 
of ker/ is going to be represented by some element of the free module M such that f{m) G P. We can 
compute this by computing the kernel of the map M (B P ^ N given by {m,p) i— ;> /(ni) — p. This is 
a kernel of a map between free k[t]-modules, and thus computable with the matrix algorithm above. The 
actual generators in M are given by the projection onto the first summand ttm '■ {m, q) i— > m, and we write 
K^7Tm{Fk). 

The resulting elements form a free submodule of M giving generators of the kernel module. The kernel 
module will have relations, however, and not all relations in Q are going to be in the submodule K. Hence, 
for a full presentation, we need to find the relations that are actually in K, which we can do with another 
kernel computation. We consider the map K (BQ M given by {k,q) ^ k — q and compute its kernel. The 
projection ttk ■ (fc, q) ^-^ k gives us a set of relations. 

The constructions above, as well as Grobner basis approaches to handling free modules over polynomial 
rings are described in [l4] . 



3.2. Spectral sequence differentials. Using the algebraic operations described above, we describe an 
algorithm for computing persistent homology via the Mayer- Vietoris spectral sequence. This algorithm is 
iterative and converges to the correct cycles Z°° and boundaries B°° so that the isomorphism in Theorem 
□ holds. 

Converge occurs when the spectral sequence collapses. In particular, if the differential is the zero-map 
everywhere, then = Z"^, and P''+^ = , and we can verify that cT^^ is also going to be the zero-map 

everywhere, and this continues all the way up. One powerful use of this is that once the maps (F traverse 
the double complex in such a way that they only hit trivial modules, the corresponding cF has to be the 
zero-map. This happens, for instance, if the double complex is contained in a p x g bounding box where p 
and q are maximal dimensions of the complex and nerve respectively. In this case, the spectral sequence will 
collapse for r > min(p, g). At that point all the maps will map outside the box and so can only hit trivial 
modules. 

We first describe the initialization of the algorithm. Beginning with the double complex of Figure l2ja) , 
we first compute homology vertically, replacing each E'^ ^ with the quotient module Zp^^/B^^^ where Z^ ^^ = 
kerdp,^ and B^^^ = irndp ^^i. This computes, in effect, the persistent homology of the chain complex 
over each simplex in the nerve of the cover. We represent ^ by its segregated basis, as described in 
Section IS.ll maintaining the free modules Zp ^ and B^ ^ as well as the preboundary 1^ ^ defined by the 
equatioird^/^,^ = 

Next, we describe the iterative step. We will, for now, assume that we are given the differentials and 
describe their computation afterwards. 
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(a) (b) (c) 

Figure 4. (a) A sphere with constant function everywhere except one hotspot. The partition 
of the space is along the axes. Each partition is then thickened slighlty as described in Section [5. 1| 
to give the covering, (b) The nerve of the covering. Note that the faces have tetrahedra as all 4 
vertices of the square intersect, (c) The persistent barcode of the sphere, Hi is trivial so it is not 
shown. 

k[t]/t"' 

(a) page. (b) Computing cP on E^. (c) E'^ page, collapsed spectral se- 

quence. 

Figure 5. Example computation for the sphere. 



k[t]/f' 




k[t] 



n[t] 



At the r-th iteration, at each node (p, q), we compute the kernel module of dp ^, storing it as ^ and the 
image module of rfp+^.i storing this as B^^^. Since dT is a differential, we know that -Bp^^ C Z^'^^^ , and 
by reducing both generating sets we can find a good presentation of the quotient Zp'^^/B^'^^ — E^'^^^. 

As for the concrete maps d^, we compute them as we go along. In particular, d^ is the induced map on 
by giving the inclusion maps of deeper intersections into more shallow the coefficients from the nerve 
boundary map. The higher differentials d^ are computed according to the machinery of Section [2j In 
particular, the images of dp~^^ are retained. For an element x S Ep ^ in the kernel of rfp~^^ we can find a 
corresponding image element (ip~^x € ^p_r+2 g+r-i- Therefore, we can find some y € /p_r+2 q+r such that 
q+rV — '^^p~q ^ ■ Wc define d^x = d^y. This corresponds precisely to the description in Section [2j 
where the theoretical motivation can be found explaining exactly why these operations are all possible to 
perform and appropriate. In each iteration, we use the image and kernel module computations between 
finitely presented modules and maps that are computed in earlier stages. 

For a lower level description of the algorithm see Appendix [B} 

4. Explicit example 

Consider the sphere in Figure |4][a). Its function will introduce the hot spot and its surrounding area 
significantly later than the remaining sphere. Computing persistence over the entire sphere, we would 
expect to see a signature somewhat like the one in Figure |4]jc). 

We can divide the computation on the sphere into octants divided by the axis planes, as in Figure |4|^a) 



and (b). By growing each octant by the tubular neighbourhood method in Section 5.1 we get separate 
computational units, with contractible intersections of all higher orders. Hence, after computing persistent 
homology on all patches, and on pairwise, triple and quadruple intersections, we get a double complex of 
persistence modules shown in Figure [5jb). 

The exponents come from the 6 vertices of the octahedron generating 6 quadruple intersection. In each 
of these, there are 4 different triple intersections, generating a total of 24 triple intersections. Each of 
the 6 tetrahedra has 6 edges, corresponding to all the double intersections at each vertex - however, this 



overcounts each the 12 double intersections corresponding to edges of the octahedron. Thus, out of the 36 
double intersections, only 24 remain after compensating for the overcounting. Finally, each of the 8 faces of 
the octahedron corresponds to a single covering patch. 

After computing the corresponding £''^-page, the redundancies in the duplicate representations of all 
components have largely resolved, and we are left with a double complex of persistence modules shown in 
Figure [sjjb) . Here, the map k[t] — > k[t]/t"^ maps the generator onto the generator. The kernel of this map is 
t"^k[t] C k[t], and the image is the entire module. This gives our final page of the spectral sequence shown 
in Figure [sjc). Reading off the modules diagonally, we recover the persistent homology groups Hq = k[t] 
and H2 = t"^k[t], which is precisely what we computed before. 

5. Applications 

5.1. Computing partitions. The input to use the Mayer- Vietoris spectral sequence is a filtered simplicial 
complex is covered by subcomplexes. Given a point cloud X, and a method C* (such as Vietoris-Rips or 
Witness complexes) to construct filtered a simplicial complex C*X from X, we shall describe a few schemes 
to divide C*X into a covering by appropriate subcomplexes with small intersections. Minimizing the size of 
the intersections, we minimize the amount of work which must be done in higher pages. 

Recall that the closed star in a simplicial complex of a subset of its simplices is the subcomplex consisting 
of all simplices that intersect any simplex in the subset. We now state two lemmas and refer the reader to 
Appendix \K\ for the proofs. 

Lemma 2. Suppose C* has the property that any simplex has diameter at most e/2, and that the presence 
of any simplex is decided using points of a distance at most e/2 from any of the vertices of the simplex. 

Then the closed star of a subset S € C*X is going to be contained in C^T^S for T^^S the e-tubular 
neighbourhood of S in X, the collection of points cc G X such that d{x,S) < e. 

Lemma 3. Suppose X is partitioned into subsets Vj. Then a covering of C^X by subcomplexes is given by 
the collection of subcomplexes C*TeUj. 

We now describe some concrete methods for computing the covers. 
Cube partitions. Suppose X is a subset of some Euclidean space R". Cover X by a disjoint collection of 
axis-parallel cubes. Membership in a particular cell is quickly computed by comparing the coordinates of 
the points in X to boundaries for the cubes. A covering of C*X is given by the closed stars of the cells, easily 
computable as C+T^Uj for points in the tubular neighborhoods of the cells. 

Voronoi partitions. Pick some collection of landmark points L C X. Compute Voronoi regions Vi for 
£ & L. As above, we construct a covering of C*X by closed stars of the Voronoi cells, computable as C^,T^Vj. 
Geodesic Voronoi partitions. Pick some collection of landmark points L C X. Let C* be a method 
that constructs a flag complex on some graph induced by the geometry of X. Write dr for the geodesic 
graph distance, given by dr{x,y) equal to the least number of edges from a; to y in F. Write for the set 
{j/ e X : drixji) < dY{x,£')y£' £ L\£}, or in other words the set of points closer to £ than to any other 
point in L. 

As a covering by subcomplexes, we grow each to by including all immediate neighbours of the 
points in Vg. A covering of X by subcomplexes is given by the flag complexes on U^. 

5.2. Parallelism and stratified memory. Parallelisation with a covering is performed by splitting re- 
sponsibility for the chain complex of a single covering patch or patch intersection to a single computational 
task. These tasks communicate with each other to update the global state in the horizontal differential parts 
of the computation, and perform boundary map preimage computations internally. If a computational node 
is responsible for chains over a single simplex of the nerve complex, it will only communicate with nodes 
representing simplices in its closure and its star. This can be seen since in the first iteration, that node will 
only need to communicate with the nodes responsible for its faces and cofaces. In the second iteration, it 
must communicate with the faces of its faces and cofaces of its cofaces. 

To compute the total number of communications produced, we only consider communication at the level 
of the nerve of the covering rather than the size of the underlying simplicial complex. To avoid double 
counting, we only consider communication from a simplex into its closure. 
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Figure 6. Regions for Vietoris-Rips in an ambient Voronoi decomposition. From a pointcloud 
(far left) we pick three landmark points, and get a Voronoi decomposition of the ambient space 
(middle left). By growing each Voronoi region by e, we get a covering of the point cloud satisfying 
the conclusion in Lemma |3] (middle right). The e- Vietoris-Rips complexes in each of the covering 
patches and their intersections (far right) will provide a covering of the e- Vietoris-Rips complex of 
the entire point cloud by subcomplexes. 

To get a worst-case bound, we assume that the number of iterations required is equal to the maximal 
dimension of the nerv^ which we denote with r. Over all iterations, a d-simplex may produce no more 
than 0(2'*) communications. This follows from the assumption that r > d and the fact that the size of 
a d-simplex's closure is 2"^. If we denote the number of i-dimensional simplices in the nerve with Ki and 
assume r is the maximal dimension of the nerve, the total number of communications is X]I=i ^j2*. This is 
very rough and pessimistic bound. We intend to do a more detailed analysis in future work. 

We note that this approach will also be applicable in a single-processor context. Instead of distributing 
computational tasks on separate nodes, we can consider each subcomplex or subcomplex intersection a 
memory block. This way, the spectral sequence approach structures a computation, grouping accesses to 
related memory blocks into cohesive computational tasks, which induces swap schemes and schemes for 
stratified memory access. 

5.3. Nerve lemma. The Nerve lemma |19j Corollary 4G.3] is a powerful result from algebraic topology 
that has seen repeated use in computational topology - all the way from motivating why the commonly 
used filtered simplicial complex constructions capture the original topology Js] to generating particular new 
results. 

Quite often, when computational topologists use the Nerve lemma, quite a lot of effort is expended to 
provide the contractibility conditions required in the classical statement of the Nerve lemma. This is due 
to the most common form of the Nerve lemma guarantees both homological and homotopical equivalence 
between a covering and its nerve. As we are usually only interested in homology, we observe that the 
Mayer- Vietoris spectral sequence yields a proof of a homological Nerve lemma: 

Theorem 4 (Homology Nerve lemma). Suppose a space X is covered by a collectionU of suh spaces X = IJ^ \Ji, 
such that each Q^- Uj has trivial reduced homology in all degrees. Then _ff*(X) = H^,[M{U)). 

Proof. On the i?^-page, the non-reduced homology is concentrated in degree 0. Since by assumption, the 
reduced homology is trivial for each intersection, the non-reduced homology is just a copy of the ground 
field. The differential on the i?^-page is the differential of the nerve complex itself, and the result follows 
immediately. □ 

While this is a well-known fact in classical algebraic topology, we point this out to emphasize that the 
intersections need not be equipped with explicit contracting homotopies for the Nerve Lemma to work in 
homological computations. 



*As mentioned in Section |3.2[ the number of iterations is bounded by the minimum of the maximal dimensions of the nerve 
and the simplical complex 

10 



Persistent Nerve Lemma: The Nerve Lemma was extended to the persistence setting in [4] using similiar 
honiotopical arguments. Here, we give the spectral sequence version of it which rehes on a persistent acycHcity 
condition. 

From our framework also follows a persistent Nerve lemma. 

Definition 5. A space is persistently acyclic if it has persistent homology consisting of one free generator 
of Hq, and no homology in higher degrees. 

Theorem 6 (Persistent homology Nerve lemma). Suppose a filtered space X is covered by a collection lA of 
filtered suhspaces X = IJ^ \Ji, such that each Uj is persistently acyclic. Then H^{X) — H^{M{U)). 

Proof. We note that persistent homology is homology with coefficients in lk[t]. The persistent acyclicity 
condition ensures that homology has just one generator with the appropriate grading concentrated in the 
0-homology. As in Theorem |4j the sequence collapses after the page and the differential is equivalent 
to the boundary operator of the nerve. The result follows. □ 

5.4. Algebraic kernel, cokernel and image persistence. In the authors construct topological spaces 
that compute image, kernel and cokernel persistence modules for the map induced in persistent homology 
by a map between filtered topological spaces. We observe that the constructions in section |3.1| provide 
corresponding operations on a purely algebraic level, computing persistence modules for image, kernel and 
cokernel given a map of persistence modules, such as the one induced from a topological map. 

The notion of compatible filtration included as a condition in ff^ is reflected in the requirement that maps 
between persistence modules be graded module homomorphisms of degree 0, implying that the filtration 
inclusions on both sides commute with the map itself. 

6. Discussion 

We have constructed algorithms to compute with the spectral sequence generated by a double complex. 
In particular, we describe how to compute a spectral sequence generalizing the Mayer- Vietoris long exact 
sequence to compute both classical and persistent homology using a cover of the space and restricting 
computation to each of the covered regions and their intersections. 

This layout enables us to formulate parallelization schemes, with parallelism and independency guarantees 
present both along the Nerve complex axis of our computational scheme and along separate components of 
the covering. 

As an added bonus, the formalism of the spectral sequence approach suggests simplified proofs for several 
classical interesting results in the persistence community, and opens up a wide spectrum of interesting areas 
of research. In particular, we are interested in pursuing: 

Complexity bounds for parallelism: This paper presents the formalism, but does not attempt a 

complete description of the complexity behaviour of the proposed algorithms. 
HPC implementation: While the work was driven by a wish to take persistent homology to the 

world of computing clusters, our implementation so far is a serial one. A parallel implementation, 

and actual performance measurements would be very interesting. 
Cover generation schemes: While we give a few suggestions here, the construction of an adequate 

cover that uses or highlights inherent features of the dataset under analysis is going to be an important 

factor of these methods. A more complete catalogue of cover generation schemes is going to be 

important for the practical application of our techniques. 
Theoretical implications: We have already seen that several results have algebraic proofs that vastly 

simplify the arguments involved. We would be very interested in exploring the space of new or 

classical results that become accessible with a spectral sequence language. 
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Figure 7. The basic reduction: As in we reduce the matrix 7? left to right, always choosing 
the lowest possible pivot with shaded areas representing non-zero entries. In contrast with we 
represent each dimension separately, so the matrices are not upper triangular. As we reduce R, 
we perform the same operations on C. At the end, we perform a permutation to put the columns 
with pivots on the left (this is not done in practice). These represent the boundary basis and the 
corresponding columns in C are the preboundary basis. The columns in C corresponding to the 
empty columns in R are the cycle basis. 



Appendix A. Proofs of selected results 

Proof of Lemma^ Suppose C* determines the presence of a simplex using points at most e/2 away from 
any point in a simplex, and each simplex has diameter at most e/2. A simplex cr in C*X is in the closed star 
of S if at least one vertex of a is in S. By the condition on the diameter, all other points of a arc contained 
in Tg/2'S'. In addition, any point that can influence the decision on whether cr exists or not will be within 
e/2 of the furthest point from the vertex of a in S, and thus the simplex a is entirely determined by points 
at most e away from S, and thus by TeS*. □ 

Proof of Lemma If X is partitioned into subsets l)j , we may certainly partition C* X by the closed stars 
of the subsets. Indeed, since the Vj partition X, we are guaranteed that any vertex of a simplex lies in at 
least one of the Vj. Hence, in particular, the closed stars of all the Vj will contain all simplices in C*X. 
By Lemma [2j each such closed star is contained in C^T^Vj. Hence, the subcomplexes C^T^Vj cover X. □ 

Appendix B. Implementation 



In this section, we recount the algorithm in Section 3.2 in greater detail. In particular, the sections follow 
each other closely. 

The basic element of computation is the chain which is represented by a column vector with each row 
corresponding to a simplex and the entries corresponding to a coefficient k. Since the chains are homogeneous, 
it is sufficient to annotate each chain with its degree. Bases are a collection of linearly independent chains, 
and are represented as a matrix, where the vectors are sorted from left to right in order of increasing degre^ 

A persistent homology class is represented as a cycle chain and a boundary chain which represent the 
birth and death times of the class respectively. In terms of the presentation of the module, an all classes 
have a representative in the cycle basis where the degree corresponds to birth time. Inessential classes have 
a copy of this with higher degree in the boundary basis. In the quotient space, this corresponds precisely to 
the algebraic form described in In practice, we do not choose quite this representation. Rather than keep 
a separate boundary basis and cycle basis, inessential classes are annotated with two degrees to represent 
the boundary and cycle basis degrees. 

Initialization. : The initialization corresponds to running the standard persistence algorithm on each col- 
umn in the double complex with dP. As in [s], by reducing the boundary matrix and keeping track of 
column operations we can extract all the required bases at each node {p, q) (Figure [7]). We now perform one 
additional operation: at each node, we reduce the cycle basis with respect to the boundary basis as shown 
in Figure [8) We do this additional step to make operations at later stages simpler. 

To keep track of the preboundary we track the operations performed to produce the non-zero reduced 
columns of the boundary matrix. This way, we get a minimal spanning set of Ip ^ such that dP yields an 
isomorphism /^^^ — >■ Bp^g_i. At the end of this step, we have matrices representing Z^^^, ^ and Ip ^j for 
all (p, q). Zp g and part of Bp ,j together form a basis for Ep^^. We omit the boundary elements because 
is a subquotient of all earlier E^ , so any non-trivial class in E'' is represented earlier, and any relation from 
earlier remains a relation. Hence, any null-persistent class may show up as a relation, but only a relation 



'This corresponds to an increasing filtration index. 
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Figure 8. The reduction procedure: 1. The standard reduction into cycles and boundaries 2. 
Reduction of cycle basis with respect to cycle basis, the reduction proceeds left to right. Note that 
certain chains (non-essential classes) in Zp^ will be completely reduced. Since these are already in 
the span of the chosen basis we drop those columns. 3. The input to the next iteration ignores 
null persistent classes (in the picture we assume they happen in a left block, but this is only for 
visualization). 



killing its corresponding cycle, which also will no longer be relevant. Therefore they are still valid relations 
for the basis but cannot map to any element in Ep_^^i^^_^_j.. 

Iteration. : The general iteration computes the basis for Z^, , V , from which we compute E^^^. Note 
that these do not need to be stored over multiple iterations. We store the 0-th iteration from the initialization 
and then only the previous iteration: For the r-th iteration, we only have (r — l)-th iteration. Once completed 
we can forget the (r — l)-th iteration. As before, we first assume that all the r-differentials are given as linear 
maps. Practically, we compute the differential from the information in each subsequent iteration. This can 
be thought of as a subroutine we call at the beginning of each iteration and is described below. 

The rest of the iteration follows much as the initialization, we apply (T to Ep ^. Each element gives a 
chain x G q+r- We first reduce x with respect to the existing boundary chains. In Figure [9j we show 

how we perform a reduction step. We reduce the chain x with respect to y. The underlying operation is the 
same as for regular reduction in Gaussian elimination: we multiply the pivot column with the appropriate 
field coefhcient and add it to the chain we are reducing. The only additional book-keeping we must do is to 
take care that the degrees of the cycle and boundary chains are appropriately updated. 

Assume that both x and y are inessential classes, so each has a generator and a boundary relation. To 
illustrate this. Figure [9] shows the mapping between two persistence interval, representing the degrees of the 
generators and relations. Such a mapping would generically map some interval [c, d] to an interval [o, 6], with 
a < c < b < d. This corresponds to the requirement that the maps be graded so generators map to multiples 
of generators and relations map to relations. The kernel of the map is represented by x shifted to the degree 
of the relation of y. The cokernel is represented by y with the relation at the degree of the generator of 
X. The image is represented by y, with a degree shift up to the generator of x. The transformation of the 
intervals is shown in Figure [9j 

This corresponds to only one reduction step, we repeat this procedure completely reducing x in terms of 
the basis in Ep_^_^_i ^_|_^ and we do this for all elements in E^ ^. If in the reduction of an element, an interval 
is reduced to zero (i.e. [c, c]), further reduction is unnecessary, since it cannot be in the kernel, nor can can 
it map to anything further in the boundary. The kernel of the map is given by all the elements which have 
non-null persistence and form a basis for Z^'^^. Likewise, in the target space (assuming ^pj^^-i-i g-(-r 
already been computed), the remaining non-null persistent classes form a basis iSpl^+i.ij+r- We note that 
we can compute the basis for the boundary and kernel independently. However, then we must reduce the 
kernel basis with repect to the boundary basis (just as in the initialization). 

Computing the r- differential. : For r = 1, the differential is given by d^, where the map on homology is 
induced directly from the chain map. 

For r > 1, we use the following procedure to compute d"^ shown in Figure [TO} In the previous iteration, 
we store the image of d^~^Ep~^. More precisely, we store the representation of the image in the basis of 
q+r-i- this basis, performing the lift to /p_r+2 q+r can be done through the stored relation. The 
lift exists because from Section[2j we know that d^^^E^ ^ £ Bp_j.^2 q+r-i- 
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Figure 9. To perform a reduction of the x £ cT Ep^r-i,q-r with an element of the basis of 
y £ Ep g, we multiply by the appropriate coefhcient t'^~°'k G k[t] and add the two. The condition 
a < c < b < d follows from the grading of the maps. We update the degrees of cT Ep^r-i,q-r to [b, d] 
and Ep^^ to [a,c]. Note that if we get [d,d] or [a, a], these elements have become null persistent. 




Figure 10. To compute d'', we first find the representation of the d"" ^-Ep.q in terms of 

r+i,5+r-i- Since we know this is in the image of d , we can lift this vector to 
and then compute the chain in terms of Ip-r+i,q+r- Applying d^ , we get a chain we can reduce in 
terms of Ep_r+i^g+r, thereby computing d^ 



The last step is applying the map, which gives the chain we then reduce with respect to the i?p_r+i,q+r 
basis. By construction this satisfies equations [2] and [3j This is precisely one step of the staircase shown in 
Figure [3] 
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